Element Defined Sequence Complexity Reduction

ABSTRACT

A method for providing defined mixtures of nucleic acids is described. In certain embodiments, the method uses oligonucleotide probes attached to a solid support as a sequence-specific affinity agent to isolate and facilitate the amplification of defined nucleic acid fragment mixtures.

FIELD OF THE INVENTION

The invention relates generally to fractionation and isolation ofnucleic acids, which are useful in biological assays, and otherapplications. More specifically, the invention relates to obtainingdefined nucleic acid fragments from complex nucleic acid mixtures.

BACKGROUND OF THE INVENTION

The isolation of discrete, sequence-defined genetic elements fromcomplex genomic samples is an essential step in many genetic analysisprotocols including, de novo sequencing, re-sequencing, gene expression,epigenetic state analyses, genetic variation discovery and scoring (e.g.SNPs and STRs). Defined sequence elements or fragments that share acommon sequence element can be isolated using a common primer such asoligo dT to target the poly-A tails of messenger RNA (Chow et al.,(1988) Anal. Biochem., 175; 63). Protocols to isolate and amplifydefined mixtures of ragments from complex genomic samples currently relyupon defined primer pairs and some form of amplification protocol suchas PCR, isothermal amplification or LCR. However, these method suffersignificant limitations since the degree of multiplexing is generallylimited to only 10 to 20 primer pairs which must be co-optimized for agiven reaction condition. As a result, genetic analysis protocols thatrequire the interrogation of a large number of sites or sequenceelements within a complex sample mixture such as SNP genotyping orcomparative genomic hybridization (CGH) have relied upon various genomecomplexity reduction methods that utilize various LCR, PCR and randompriming techniques that are not element defined (Kinzler & Vogelstein,Nucleic Acids Res (1989) 17; 3645, Telenius et al., Genomics (1992) 13;1718, Kristjansson et al., Nature Genetics (1994) 6; 19, Lucito et al.,Proc. Natl. Acad. Sci. USA (1998) 95; 4487, Kennedy et al., (2003), Nat.Biotechnol. 21; 1233 & Bignell et al., (2004) Genome Research 14; 287).

What is needed is a convenient method for providing defined mixtures ofnucleic acids from complex samples.

SUMMARY OF THE INVENTION

The invention addresses the aforementioned deficiencies in the art, andprovides methods for isolating defined mixtures of nucleic acidfragments from complex nucleic acid samples. This method, termed“Element Defined Sequence Complexity Reduction” (EDSCR), utilizesoligonucleotides attached to a solid support such as a microarray as ahighly parallel sequence-specific affinity agent to isolate andfacilitate the amplification of defined nucleic acid fragment mixturesderived from complex genomic samples.

The present invention provides a method of isolating probe-definednucleic acid fragments from a sample of nucleic acids. The methodincludes obtaining fragmented nucleic acids from the sample of nucleicacids. A hybridization reaction is then performed, wherein thehybridization reaction includes contacting the fragmented nucleic acidswith a solid support having probes bound thereto to result incomplementary fragments being retained on the solid support via theprobes. The method further includes recovering the probe-defined nucleicacid fragments, wherein the recovering includes separating complementaryfragments from the solid support.

Additional objects, advantages, and novel features of this inventionshall be set forth in part in the descriptions and examples that followand in part will become apparent to those skilled in the art uponexamination of the following specifications or may be learned by thepractice of the invention. The objects and advantages of the inventionmay be realized and attained by means of the instruments, combinations,compositions and methods described herein and particularly pointed outin the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other features of the invention will be understood from thedescription of representative embodiments of the method herein and thedisclosure of illustrative apparatus for carrying out the method, takentogether with the Figures, wherein

FIG. 1 schematically illustrates the general method for performingelement defined sequence complexity reduction (EDSCR).

FIG. 2 a shows the EDSCR method for creating a standard vector pool forMismatch Repair Detection (MRD).

FIG. 2 b shows the EDSCR method for creating a pooled experimentalcohort fragment mixture for MRD.

FIG. 3 depicts the EDSCR method for reducing the complexity of a genomicsample for DNA sequence analysis

To facilitate understanding, identical reference numerals have beenused, where practical, to designate corresponding elements that arecommon to the Figures. Figure components are not drawn to scale.

DETAILED DESCRIPTION

Before the invention is described in detail, it is to be understood thatunless otherwise indicated this invention is not limited to particularmaterials, reagents, reaction materials, manufacturing processes, or thelike, as such may vary. It is also to be understood that the terminologyused herein is for purposes of describing particular embodiments only,and is not intended to be limiting. It is also possible in the presentinvention that steps may be executed in different sequence where this islogically possible. However, the sequence described below is preferred.

It must be noted that, as used in the specification and the appendedclaims, the singular forms “a,” “an” and “the” include plural referentsunless the context clearly dictates otherwise. Thus, for example,reference to “a solid support” includes a plurality of solid supports.In this specification and in the claims that follow, reference will bemade to a number of terms that shall be defined to have the followingmeanings unless a contrary intention is apparent.

“Optional” or “optionally” means that the subsequently describedcircumstance may or may not occur, so that the description includesinstances where the circumstance occurs and instances where it does not.For example, if an operation such as an amplification reaction isoptionally performed it means that the amplification reaction may or maynot be performed, and, thus, the description includes embodimentswherein the amplification reaction is performed and embodiments whereinthe amplification reaction is not performed.

The term “oligomer” is used herein to indicate a chemical entity thatcontains a plurality of monomers. As used herein, the terms “oligomer”and “polymer” are used interchangeably. Examples of oligomers andpolymers include polydeoxyribonucleotides (DNA), polyribonucleotides(RNA), other nucleic acids that are C-glycosides of a purine orpyrimidine base, polypeptides (proteins) or polysaccharides (starches,or polysugars), as well as other chemical entities that containrepeating units of like chemical structure.

The term “nucleic acid” as used herein means a polymer composed ofnucleotides, e.g., deoxyribonucleotides or ribonucleotides, or compoundsproduced synthetically (e.g., PNA as described in U.S. Pat. No.5,948,902 and the references cited therein) which can hybridize withnaturally occurring nucleic acids in a sequence specific manneranalogous to that of two naturally occurring nucleic acids, e.g., canparticipate in Watson-Crick base pairing interactions. Nucleic acidfragments are nucleic acids obtained from longer nucleic acids whichhave been fragmented, e.g. by shearing or by enzymatic cleavage or anyother method of cutting nucleic acids.

The terms “ribonucleic acid” and “RNA” as used herein mean a polymercomposed of ribonucleotides.

The terms “deoxyribonucleic acid” and “DNA” as used herein mean apolymer composed of deoxyribonucleotides.

The term “oligonucleotide” as used herein denotes single strandednucleotide multimers of from about 10 to about 100 nucleotides and up toabout 200 nucleotides in length.

The term “sample” as used herein relates to a material or mixture ofmaterials, typically, although not necessarily, in fluid form,containing one or more components of interest.

The terms “nucleoside” and “nucleotide” are intended to include thosemoieties that contain not only the known purine and pyrimidine bases,but also other heterocyclic bases that have been modified. Suchmodifications include methylated purines or pyrimidines, acylatedpurines or pyrimidines, alkylated riboses or other heterocycles. Inaddition, the terms “nucleoside” and “nucleotide” include those moietiesthat contain not only conventional ribose and deoxyribose sugars, butother sugars as well. Modified nucleosides or nucleotides also includemodifications on the sugar moiety, e.g., wherein one or more of thehydroxyl groups are replaced with halogen atoms or aliphatic groups, orare functionalized as ethers, amines, or the like.

The phrase “oligonucleotide bound to a solid support” refers to anoligonucleotide or mimetic thereof, e.g., PNA, that is immobilized on asurface of a solid substrate, e.g. in a feature or spot of an array,where the substrate can have a variety of configurations, e.g., a sheet,bead, or other structure. In certain embodiments, the collections offeatures of oligonucleotides employed herein are present on a surface ofthe same planar support, e.g., in the form of an array.

The term “array” encompasses the term “microarray” and refers to anordered array presented for binding to nucleic acids and the like.Arrays, as described in greater detail below, are generally made up of aplurality of distinct or different features. The term “feature” is usedinterchangeably herein with the terms: “features,” “feature elements,”“spots,” “addressable regions,” “regions of different moieties,”“surface or substrate immobilized elements” and “array elements,” whereeach feature is made up of oligonucleotides bound to a surface of asolid support, also referred to as substrate immobilized nucleic acids.The oligonucleotides bound to the solid support are referenced as“probes” or “probe molecules”, and a nucleic acid in a mobile phase thatis complementary to a probe is referenced as a “target’ or a “targetmolecule.”

The term “stringent conditions” as used herein refers to conditions thatare compatible to produce binding pairs of nucleic acids, e.g., surfacebound and solution phase nucleic acids, of sufficient complementarity toprovide for the desired level of specificity while being less compatibleto the formation of binding pairs between binding members ofinsufficient complementarity to provide for the desired specificity.Stringent conditions are the summation or combination (totality) of bothhybridization and wash conditions.

A “stringent hybridization” and “stringent hybridization washconditions” in the context of nucleic acid hybridization (e.g., as inarray, Southern or Northern hybridizations, or immobilization to a solidsupport via a support-bound probe) are sequence dependent, and aredifferent under different experimental parameters.

“Probe-defined,” as in “probe-defined fragments,” references a mixtureof one or more nucleic acids, e.g. oligonucleotides or polynucleotides,that have been isolated by hybridizing to a collection of probes boundto a solid support, e.g. a mixture of beads or an array substrate, e.g.under stringent hybridization conditions. The isolation of theprobe-defined fragments typically results in a reduced complexitymixture by removal of nucleic acids which do not bind to the array, e.g.under stringent hybridization conditions. The probes on the array may beselected to bind to any desired sequence, and typically will be selectedto isolate particular fragments from a more complex mixture offragments, e.g. a fragmented genomic DNA or nucleic acids derived fromany source from which it is desired to isolate a reduced complexity setof fragments.

The present invention discloses a general method for isolatingprobe-defined nucleic acid fragments from complex nucleic acid samples.This method, termed “Element Defined Sequence Complexity Reduction”(EDSCR) utilizes oligonucleotides attached to a solid support such as amicroarray as a highly parallel sequence-specific affinity agent toisolate and facilitate the amplification of defined nucleic acidfragment mixtures derived from complex genomic samples. The uses of theresulting defined fragment mixtures include, but are not limited to; i)generating cloned libraries having defined complexity, ii) preparenucleic acid fragment mixtures for sequence analysis including, de novosequencing, re-sequencing, SNP genotyping, and methylation stateanalysis, and iii) analyzing gene expression including discovery of rareor novel expressed genes and identifying alternative mRNA splice-sitevariants.

Microarrays for Use in Fragment Isolation:

A “microarray,” includes any one-dimensional, two-dimensional orsubstantially two-dimensional (as well as a three-dimensional)arrangement of addressable regions (i.e., features, e.g., in the form ofspots) bearing nucleic acids, particularly oligonucleotides or syntheticmimetics thereof (i.e., the oligonucleotides defined above), and thelike. Where the microarrays are microarrays of nucleic acids, thenucleic acids may be adsorbed, physisorbed, chemisorbed, or covalentlyattached to the microarrays at any point or points along the nucleicacid chain.

Any given substrate may carry one, two, four or more microarraysdisposed on a front surface of the substrate. Depending upon the use,any or all of the microarrays may be the same or different from oneanother and each may contain multiple spots or features. A typicalmicroarray may contain one or more, including more than two, more thanten, more than one hundred, more than one thousand, more ten thousandfeatures, or even more than one hundred thousand features, in an area ofless than 20 cm² or even less than 10 cm², e.g., less than about 5 cm²,including less than about 1 cm², less than about 1 mm², e.g., 100 μm²,or even smaller. For example, features may have widths (that is,diameter, for a round spot) in the range from a 10 μm to 1.0 cm. Inother embodiments each feature may have a width in the range of 1.0 μmto 1.0 mm, usually 5.0 μm to 500 μm, and more usually 10 μm to 200 μm.Non-round features may have area ranges equivalent to that of circularfeatures with the foregoing width (diameter) ranges. At least some, orall, of the features are of different compositions (for example, whenany repeats of each feature composition are excluded the remainingfeatures may account for at least 5%, 10%, 20%, 50%, 95%, 99% or 100% ofthe total number of features). Each feature typically includes one ormore oligonucleotides bound to a substrate. Some arrays may have morethan one oligonucleotide per feature, e.g. two, three, at least about 5,at least about 10, at least about 50, or at least about 100, or more,oligonucleotides per feature of the array, wherein each oligonucleotidehas a different sequence from the other nucleotides of that feature. Themicroarray may have up to 1000 or more different probes, e.g. up to 5000or more, up to 10,000 or more, up to 50,000 or more, or up to 100,000 ormore different probes. Inter-feature areas will typically (but notessentially) be present which do not carry any nucleic acids (or otherbiopolymer or chemical moiety of a type of which the features arecomposed). Such inter-feature areas typically will be present where themicroarrays are formed by processes involving drop deposition ofreagents but may not be present when, for example, photolithographicmicroarray fabrication processes are used. It will be appreciatedthough, that the inter-feature areas, when present, could be of varioussizes and configurations.

Each microarray may cover an area of less than 200 cm², or even lessthan 50 cm², 5 cm², 1 cm², 0.5 cm², or 0.1 cm². In certain embodiments,the substrate carrying the one or more microarrays will be shapedgenerally as a rectangular solid (although other shapes are possible),having a length of more than 4 mm and less than 150 mm, usually morethan 4 mm and less than 80 mm, more usually less than 20 mm; a width ofmore than 4 mm and less than 150 mm, usually less than 80 mm and moreusually less than 20 mm; and a thickness of more than 0.01 mm and lessthan 5.0 mm, usually more than 0.1 mm and less than 2 mm and moreusually more than 0.2 and less than 1.5 mm, such as more than about 0.8mm and less than about 1.2 mm. With microarrays that are read bydetecting fluorescence, the substrate may be of a material that emitslow fluorescence upon illumination with the excitation light.Additionally in this situation, the substrate may be relativelytransparent to reduce the absorption of the incident illuminating laserlight and subsequent heating if the focused laser beam travels tooslowly over a region. For example, the substrate may transmit at least20%, or 50% (or even at least 70%, 90%, or 95%), of the illuminatinglight incident on the front as may be measured across the entireintegrated spectrum of such illuminating light or alternatively at 532nm or 633 nm.

Microarrays can be fabricated using drop deposition from pulse-jets ofeither nucleic acid precursor units (such as monomers) in the case of insitu fabrication, or the previously obtained nucleic acid. Such methodsare described in detail in, for example, the previously cited referencesincluding U.S. Pat. No. 6,242,266, U.S. Pat. No. 6,232,072, U.S. Pat.No. 6,180,351, U.S. Pat. No. 6,171,797, U.S. Pat. No. 6,323,043, U.S.patent application Ser. No. 09/302,898 filed Apr. 30, 1999 by Caren etal., and the references cited therein. As already mentioned, thesereferences are incorporated herein by reference. Other drop depositionmethods can be used for fabrication, as previously described herein.Also, instead of drop deposition methods, photolithographic microarrayfabrication methods may be used. Inter-feature areas need not be presentparticularly when the microarrays are made by photolithographic methodsas described in those patents.

In certain embodiments of particular interest, in situ preparedmicroarrays are employed. In situ prepared oligonucleotide microarrays,e.g., nucleic acid microarrays, may be characterized by having surfaceproperties of the substrate that differ significantly between thefeature and inter-feature areas. Specifically, such microarrays may havehigh surface energy, hydrophilic features and hydrophobic, low surfaceenergy hydrophobic interfeature regions. Whether a given region, e.g.,feature or interfeature region, of a substrate has a high or low surfaceenergy can be readily determined by determining the regions “contactangle” with water, as known in the art and further described incopending application Ser. No. 10/449,838 to Peck et al., filed May 30,2003, the disclosure of which is herein incorporated by reference. Otherfeatures of in situ prepared microarrays that make such microarrayformats of particular interest in certain embodiments of the presentinvention include, but are not limited to: feature density,oligonucleotide density within each feature, feature uniformity, lowintra-feature background, low inter-feature background, e.g., due tohydrophobic interfeature regions, fidelity of oligonucleotide elementsmaking up the individual features, microarray/feature reproducibility,and the like. The above benefits of in situ produced microarrays assistin maintaining adequate sensitivity while operating under stringencyconditions required to accommodate highly complex samples.

The process of the current invention may be employed on arraysfabricated on any solid support having a surface to which chemicalentities may bind. The solid support will typically comprise materialsthat provide support for the deposited material (e.g. the probes and anychemical modification of the surface of the solid support) and endurethe conditions of the deposition process and of any subsequent treatmentor handling or processing that may be encountered in the use of thesolid support. Suitable solid supports may have a variety of forms andcompositions and may derive from naturally occurring materials,naturally occurring materials that have been synthetically modified, orsynthetic materials. Examples of suitable materials include, but are notlimited to, nitrocellulose, glasses, silicas, teflons, and metals (forexample, gold, platinum, and the like). Suitable materials also includepolymeric materials, including plastics (for example,polytetrafluoroethylene, polypropylene, polystyrene, polycarbonate, andblends thereof, and the like), polysaccharides such as agarose (e.g.,that available commercially as Sepharose®, from Pharmacia) and dextran(e.g., those available commercially under the tradenames Sephadex® andSephacyl®, also from Pharmacia), polyacrylamides, polystyrenes,polyvinyl alcohols, copolymers of hydroxyethyl methacrylate and methylmethacrylate, and the like.

In the description below, embodiments of the present invention aremostly described with reference to performing EDSCR using a microarray.However, it should be appreciated that in particular embodiments, EDSCRmay be performed using a mixture of beads, each bead having a definedoligonucleotide probe analogous to that of a microarray in which eachbead corresponds to a feature of the microarray. All microarray designelements are applicable to the design of the bead mixture with respectto feature density (number of beads in the mixture), probe length andlocation with respect to the target nucleic acids. The beads may be madeof any material(s) that are compatible with the present method and thatprovide a surface for attachment of the probes.

A method of isolating probe-defined nucleic acid fragments from a sampleof nucleic acids in accordance with the present invention is nowdescribed. The method includes obtaining fragmented nucleic acids fromthe sample of nucleic acids. A hybridization reaction is then performed,wherein the hybridization reaction includes contacting the fragmentednucleic acids with a solid support having probes bound thereto to resultin complementary fragments being retained on the solid support via theprobes. The method further includes recovering the probe-defined nucleicacid fragments, wherein the recovering includes separating complementaryfragments from the solid support. Methods in accordance with the presentinvention will be set forth herein with reference to the Figures and thedescription below.

Proceeding now with reference to FIG. 1, an embodiment of the currentinvention is described. In FIG. 1 the general EDSCR process includesobtaining fragmented nucleic acids. In the illustrated embodiment, toobtain the fragmented nucleic acids, a sample of nucleic acids isprovided, e.g. a genomic DNA sample 102. The method of the presentinvention can be performed using samples of nucleic acids derived fromany number of sources including, but not limited to; i) genomic DNAderived from tissue, blood, whole cells, ii) pathogen DNA derived fromtissue, blood, whole cells, iii) viral or phage DNA, iv) cloned DNAlibraries, v) synthetic DNA libraries, vi) cDNA derived from the reversetranscription of RNA including mRNA (pre and post processed), vii) anyother source of complex mixtures of nucleic acids that can be hybridizedto probes bound to a solid support, and viii) any other source ofnucleic acids from which it is desired to isolate nucleic acid fragmentshaving reduced complexity.

The method illustrated in FIG. 1 proceeds as follows for obtainingfragmented nucleic acids: In the illustrated embodiment, the genomic DNAsample 102 is fragmented (indicated by arrow 104) to result infragmented genomic DNA 106. The fragmentation 104 may be accomplishedvia mechanical or enzymatic methods depending upon the particularrequirements of the downstream application. For example, if thedownstream application can accommodate random fragments whose boundariesdo not have to be location defined, then various random enzymatic ormechanical fragmentation methods can be employed. These include thetreatment with Dnase 1, shearing through a syringe having a definedgauge needle, nebulization or sonication (see for example; Davidson, P.F. (1959) Proc. Natl. Acad. Sci USA, 45, 1560, Schriefer et al, (1990)Nucleic Acids Res., 18, 7455, Cavalieri and Rosenberg, (1959) J. Am.Chem. Soc., 81, 5136). Some applications however may require that thelocation of fragmentation be defined. In this case, the DNA can bedigested with a restriction endonuclease. Depending on the choice ofrestriction endonuclease, the fragmented DNA may be blunt-ended or mayhave 3′- or 5′-overhanging termini. Selection and use of restrictionendonuclease will be apparent to one of ordinary skill in the art giventhe disclosure herein. When using restriction endonucleases, the averagelength of the fragments in the final digested mixture will depend uponthe enzyme used. For example, a restriction endonuclease that has a fourbase long recognition sequence will, on average, digest the DNA intosmaller fragments (4⁴=256-mers) than a restriction endonuclease that hasa six base long recognition sequence (4⁶=4,096-mers average length).

A number of interdependent factors should be considered in establishingthe experimental design, depending on the desired application. Thesefactors include the sequence complexity of the nucleic acids in theinitial sample of nucleic acids, the desired degree of sequencecomplexity reduction, the number of features on the microarray, theprobe lengths and the particular application in which the final reducedcomplexity fragment mixture will be used (see discussion below). Formost embodiments, the average fragment length should be between 50 and20,000 nucleotides, and more typically should be between 100 and 2,000nucleotides. In general, shorter fragments will give higher degrees ofhybridization specificity since they will minimize any non-specific andfragment-fragment interactions. However, when a particular genomicregion does not possess a sufficiently unique sequence to serve as aprobe hybridization site at or near the region of interest, it will benecessary to increase the average fragment length such that a uniquesite is present within the region of interest. For example, the averagelength of an exon in a human gene is only 120 base-pairs. If theapplication was directed at the isolation of defined exons from a samplecomprising the entire human genome complexity, it is unlikely that itwill be a high frequency event to find a probe hybridization site withinthe 120 base-pair target site that possesses the necessary specificity.Thus, it may be necessary to target the exon with a probe some distancefrom the exon thereby requiring the average fragment length to beconcomitantly increased.

In the embodiment illustrated in FIG. 1, the fragmented genomic DNA 106resulting from the fragmentation 104 are “intermediate fragments” (thatis, they are subjected to further processing before being used in thenext portion of the invention). After fragmentation, it may be desirableto introduce a defined or universal primer site onto the termini of theintermediate fragments. This can be accomplished by the ligation of adouble stranded DNA adaptor duplex onto the termini of the genomicfragments using DNA ligase (J. Sambrook, E. F. Fritsch, & T. Maniatis,“Molecular Cloning; A Laboratory Manual” (1989), Cold Spring HarborLaboratory Press, USA). In embodiments in which a restrictionendonuclease is used to generate the fragments, the adaptor duplex maypossess cohesive ends compatible with that of the genomic fragmentmixture (see Saiki et al., Science (1985) 230:1350 & Lucito et al.,Proc. Natl. Acad. Sci. USA (1998) 95; 4487). In the embodimentillustrated in FIG. 1, these intermediate fragments (the fragmentedgenomic DNA 106) are ligated (indicated by arrow 114) to DNA adaptorduplexes 112 to result in duplex terminated fragments 116. The duplexterminated fragments 116 obtained from the ligation provide thefragmented nucleic acids in the embodiment illustrated in FIG. 1

In particular embodiments the adaptor duplex 112 may comprise a uniquesequence element of sufficient length to serve as a primer site forsubsequent amplification of all isolated fragments. The length of theprimer site within the adaptor duplex will be defined by the sequencecomplexity and/or sequence composition of the restriction digest mixtureand is preferably between 10 and 30 nucleotides in length. It may alsobe desirable for some applications (see below) that the adaptor duplexcomprises the recognition site for the restriction endonuclease used togenerate the genomic fragments in the first step. The sequence of therestriction endonuclease recognition site on the adaptor duplex may besuch that the site used to digest the genomic mixture is destroyed uponligation of the adaptor to the genomic fragment mixture (see U.S. Pat.No. 5,093,245 to Keith et al.). In certain embodiments, the primer sitealso includes another restriction and/or nicking endonuclease site foruse in isothermal amplification methods. In some embodiments the adaptorduplex also contains an identifier or tag sequence that can be used toeither identify the identifier-sequence containing fragments or tofacilitate isolation of the tag-sequence containing fragments from amore complex fragment mixture. To isolate the tag-sequence containingfragments, an affinity separation technique may be employed, wherein theaffinity separation technique includes contacting the tag-sequencecontaining fragments with an affinity matrix having a complementarytag-recognition sequence bound to a matrix support, and then eluting thetag-sequence containing fragments. In certain embodiments the adaptorduplex may also possess a reactive moiety that allows theadaptor-containing fragment to be attached to a surface of a substrate.Such moieties include biotin, reactive amines, aldehydes, esters and thelike. In such embodiments, the surface of the substrate will typicallyhave a complementary active moiety for reacting with the reactive moietyof the adaptor duplex.

In embodiments in which an adaptor duplex 112 is ligated to theintermediate fragments (e.g. the fragmented genomic DNA 106), it may bedesirable to remove any remaining unligated adaptor duplexes 112 fromthe ligation reaction mixture to purify the duplex terminated fragments116. This may be accomplished by passing the ligation reaction mixturethrough a gel filtration column or filter membrane which possesses asuitable molecular weight cut off. The adaptor-containing fragments canalso be purified by using an affinity matrix (e.g. gel or bead)containing an oligonucleotide probe complementary to a common sequencein the adaptor duplex, e.g. a tag sequence. The duplex terminatedfragments 116 obtained from the purification provide the fragmentednucleic acids in the embodiment employing such a purification.

A method in accordance with the invention thus includes obtainingfragmented nucleic acids, as described above with regard to theembodiment illustrated in FIG. 1. In particular embodiments, thefragmented nucleic acids may comprise adaptor duplex terminatedfragments, as described above. In certain embodiments, the fragmentednucleic acids may be obtained essentially by fragmenting a sample ofnucleic acids, thereby resulting in the fragmented nucleic acids(without addition of adaptor duplexes), as described above.

In an embodiment in accordance with the present invention, the method ofisolating probe-defined nucleic acid fragments includes, after obtainingthe fragmented nucleic acids, performing a hybridization reaction.Performing a hybridization reaction includes contacting the fragmentednucleic acids with a solid support having probes bound thereto to resultin complementary fragments being retained on the solid support via theprobes. In typical embodiments, before contacting the fragmented nucleicacids with a solid support having probes bound thereto, the fragmentednucleic acids are denatured to facilitate hybridization to the probesbound to the solid substrate. The fragmented nucleic acids may denaturedby any suitable process, e.g. heating to 95° C. for 10 minutes and thenquickly cooling to denature the double-stranded fragments, or treatingwith mild base (pH 10) followed by neutralization and precipitation withethanol (J. Sambrook, E. F. Fritsch, & T. Maniatis, “Molecular Cloning;A Laboratory Manual” (1989), Cold Spring Harbor Laboratory Press, USA).

The denatured fragment mixture is then fractionated by hybridizing themixture to a microarray comprising a predefined set of oligonucleotideprobes having sequences that are complementary to the DNA fragments thatare to be isolated from the sample. Referring again to the embodimentillustrated in FIG. 1, the fragmented nucleic acids obtained asdescribed above (the duplex terminated fragments 116) are contacted(indicated by arrow 124) with a solid support 120 having probes 122bound thereto to result in complementary fragments 126 being retained onthe solid support 120 via the probes 122. “Complementary fragments”references the portion of the fragmented nucleic acids that arespecifically hybridized to the probes on the solid support and areretained on the solid support while subjected to the hybridization andwash conditions used while performing the hybridization reaction.

The hybridization and wash conditions used while performing thehybridization reaction are selected to provide the requiredhybridization specificity. The hybridization and wash stringencies aretypically selected to provide stable hybridization between themicroarray probes and their complementary fragments while providing fordestabilization of all non-specific probe-fragment and higher orderfragment-fragment interactions.

Stringent hybridization conditions typically used while performing thehybridization reaction may include, e.g., hybridization in a buffercomprising 50% formamide, 5×SSC, and 1% SDS at 42° C., or hybridizationin a buffer comprising 5×SSC and 1% SDS at 65° C., both with a wash of0.2×SSC and 0.1% SDS at 65° C. Exemplary stringent hybridizationconditions can also include a hybridization in a buffer of 40%formamide, 1 M NaCl, and 1% SDS at 37° C., and a wash in 1×SSC at 45° C.Alternatively, hybridization to filter-bound DNA in 0.5 M NaHPO₄, 7%sodium dodecyl sulfate (SDS), 1 mM EDTA at 65° C., and washing in0.1×SSC/0.1% SDS at 68° C. can be employed. Yet additional stringenthybridization conditions include hybridization at 60° C. or higher and3×SSC (450 mM sodium chloride/45 mM sodium citrate) or incubation at 42°C. in a solution containing 30% formamide, 1M NaCl, 0.5% sodiumsarcosine, 50 mM MES, pH 6.5. Those of ordinary skill in the art willreadily recognize that alternative but comparable hybridization and washconditions can be utilized to provide stringent hybridizationconditions.

Hybridization specificity can be further enhanced by including variouscarrier DNAs and/or RNAs in the hybridization mixture. The incorporationof the unique primer sites into the starting fragment mixture allows forthe inclusion of modest amounts of various carrier or blocking nucleicacids in the hybridization buffer since the presence of any carriernucleic acids in the final eluted fraction (see below) would either notbe amplified in the final step of the EDSCR process or interfere withthe downstream application in which the fragment mixture is to be used.For example, Cot-1 DNA and/or random N-mer oligonucleotide (e.g. random25-mer) can be included at a concentration ranging between 0.05 and 0.5ug/mL in the hybridization mixture.

Alternatively, the hybridization mixture could contain RNA such as yeastor E. coli tRNA at a concentration ranging between 0.05 and 0.5 ug/mL.In this case, any remaining carrier RNA can be easily removed bytreatment of the sample with DNAse-free RNAase (J. Sambrook, E. F.Fritsch, & T. Maniatis, “Molecular Cloning; A Laboratory Manual” (1989),Cold Spring Harbor Laboratory Press, USA). Also, the Cot-1 DNA could bereplaced with an RNA equivalent generated from in vitro transcriptionwhich could be easily removed by treatment of the sample with DNAse-freeRNAase as described above.

In typical embodiments, such as that illustrated in FIG. 1, aftercontacting 124 the fragmented nucleic acids (e.g. the fragmented genomicDNA 106) with a solid support 120 having probes 122 bound thereto,non-specifically bound fragments 132 are washed (indicated by arrow 134)off the solid support 120.

The wash conditions are selected to provide the desired level ofstringency for the hybridization reaction. The wash conditions mayinclude stringent wash conditions, such as, e.g.: a salt concentrationof about 0.02 molar at pH 7 and a temperature of at least about 50° C.or about 55° C. to about 60° C.; or, a salt concentration of about 0.15M NaCl at 72° C. for about 15 minutes; or, a salt concentration of about0.2×SSC at a temperature of at least about 50° C. or about 55° C. toabout 60° C. for about 15 to about 20 minutes; or, the hybridizationcomplex is washed twice with a solution with a salt concentration ofabout 2×SSC containing 0.1% SDS at room temperature for 15 minutes andthen washed twice by 0.1×SSC containing 0.1% SDS at 68° C. for 15minutes; or, equivalent conditions. Stringent conditions for washing canalso be, e.g., 0.2×SSC/0.1% SDS at 42° C.

Because the fragments can be amplified following elution from the solidsupport (see below), the hybridization and wash steps can be optimizedfor specificity at the expense of washing away some of the desiredcomplementary fragments. This is possible because even in very highdensity microarrays (˜50,000 features within a 1 cm×1 cm area), the sizeof each feature is such that number of probe molecules in each feature(between 10,000 and 100,000 probe molecules per μm²) will ensure that asufficient number of fragment molecules will remain hybridized evenunder very stringent conditions.

The embodiment illustrated in FIG. 1 shows the use of a microarray forperforming a hybridization reaction with the fragmented nucleic acids.It is contemplated that the hybridization reaction may be performedusing a mixture of beads as the solid support to which the probes arebound, instead of a microarray. In such embodiments adjustment of thereaction conditions and other experimental parameters will be apparentgiven ordinary skill in the art and the disclosure herein.

A method in accordance with the invention thus includes performing ahybridization reaction, wherein the hybridization reaction comprisescontacting the fragmented nucleic acids with a solid support havingprobes bound thereto to result in complementary fragments being retainedon the solid support via the probes, as described above. The fragmentednucleic acids may be denatured before the contacting occurs. Inparticular embodiments, the hybridization reaction is performed understringent conditions, as described above. In certain embodiments,various carrier DNAs and/or RNAs may be added to the fragmented nucleicacids before the contacting occurs, as described above.

In an embodiment in accordance with the present invention, the method ofisolating probe-defined nucleic acid fragments includes, afterperforming the hybridization reaction, recovering probe-defined nucleicacid fragments. Recovering probe-defined nucleic acid fragments includesseparating complementary fragments from the solid support. Referringagain to FIG. 1, in the illustrated embodiment after the hybridizationreaction is performed (including contacting 124 and washing 134), thecomplementary fragments 126 being retained on the solid support 120 areseparated (indicated by arrow 144) to provide the complementaryfragments in solution 146, released from the solid support 120.

Complementary fragments may be separated from the solid support usingany effective method to release the complementary fragments from thesolid support and retrieve the complementary fragments. In anembodiment, separating the complementary fragments from the solidsupport includes eluting the complementary fragments with an elutionbuffer under conditions effective to release the complementary fragmentsfrom the solid support, and retrieving an eluant containing thecomplementary fragments. Conditions effective for releasing thecomplementary fragments typically involve washing the solid support in alow salt buffer at elevated temperatures and/or pH. This would include,but be not limited to, buffer conditions comprising 0.5×TE (pH 8 to9.5), 0.05×SSC/0.1% SDS at a temperature ranging between 70° C. and 95°C. Denaturants such as formamide or urea will typically be included inthe elution buffer to efficiently elute the complementary fragments.

In another embodiment, separating the complementary fragments from thesolid support includes cleaving probes from the array to release thecomplementary fragments into solution, and retrieving the solutioncontaining the complementary fragments. In one embodiment, probes arecleaved from a solid substrate such as a glass microarray substrate bytreating the microarray with a basic solution (pH˜10) followed byneutralization of the solution and precipitation with ethanol toretrieve the complementary fragments. In certain embodiments the probesare bound to the solid support via a chemically cleavable moiety whichis cleaved to release the probes from the solid support (releasing thecomplementary fragments, also) after performing the hybridizationreaction. DNA fragments may also be obtained by first isolating thefeature-containing region of the solid support (e.g. a microarray),crushing the microarray into a powder and then using fragment-containingpowder directly in the fragment amplification reaction (below) ordirectly in the desired application.

In particular embodiments, the complementary fragments separated fromthe solid support provide the probe-defined nucleic acid fragmentswithout further processing. In other embodiments, recovering theprobe-defined nucleic acid fragments includes further processing of thecomplementary fragments to yield the probe-defined nucleic acidfragments. In the embodiment illustrated in FIG. 1, the complementaryfragments are subjected to further processing (indicated by arrow 154)to provide the probe-defined nucleic acid fragments 156. Various methodsof manipulating nucleic acids are known in the art; such methods ofmanipulating nucleic acids that are adapted to processing thecomplementary fragments to result in the probe-defined nucleic acidfragments may be employed for further processing of the complementaryfragments as described herein. In certain embodiments, the furtherprocessing may include purifying the complementary fragments separatedfrom the solid support; in such embodiments, recovering probe-definednucleic acid fragments further comprises, after separating thecomplementary fragments from the solid support, purifying thecomplementary fragments to provide the probe-defined nucleic acidfragments. Purifying the complementary fragments may be accomplished viaany suitable method, including precipitation (e.g. precipitation withethanol under conditions of high salt concentration), affinitychromatography, ion exchange chromatography, size exclusionchromatography, or any other method adapted to purifying thecomplementary fragments to yield the probe-defined nucleic acidfragments. In some embodiments, the purified fragments may then beamplified in an amplification reaction or may be used directly in anapplication which requires probe defined nucleic acid fragments.

In some embodiments in which recovering the probe-defined nucleic acidfragments includes further processing of the complementary fragments,the further processing may include amplifying the complementaryfragments. In such embodiments, recovering probe-defined nucleic acidfragments further comprises, after separating the complementaryfragments from the solid support, amplifying the complementary fragmentsto provide the probe-defined nucleic acid fragments. Any one of a numberof standard primer-dependent amplification methods, including PCR, LCR,and isothermal amplification (Walker et al., (1992) Proc. Natl. Acad.Sci USA 89; 392), may be used for amplifying the complementaryfragments. The primers used in a primer-dependent amplification (e.g. byPCR) will typically be complementary to a primer site incorporated intoan adaptor duplex ligated to the fragment termini during the ligationreaction, as described above. If an isothermal amplification is used,the restriction and/or nicking endonuclease site will be incorporatedinto the adaptor sequence (above).

In particular embodiments, the method of the present invention providesprobe-defined nucleic acid fragments. Characteristics of theprobe-defined nucleic acid fragments isolated by methods describedherein will typically depend on a number of factors, including thesource of the sample, the number and identity of the probes bound to thesolid support, and the conditions used in the hybridization and wash, aswell as other factors. Given the disclosure herein, it lies withinordinary skill in the art to determine these factors and to performmethods according to the present invention without undueexperimentation. In typical embodiments, the probe-defined nucleic acidfragments include at least 5 different nucleic acid fragments, e.g. atleast 10 different nucleic acid fragments, at least 50 different nucleicacid fragments, at least 100 different nucleic acid fragments, at least500 different nucleic acid fragments, at least 1000 different nucleicacid fragments, or at least 5000 different nucleic acid fragments, ormore. In certain embodiments the probe-defined nucleic acid fragmentsinclude up to about 100,000 or more different nucleic acid fragmentse.g. up to about 50,000 different nucleic acid fragments, up to about10,000 different nucleic acid fragments, up to about 50000 differentnucleic acid fragments, or more.

Thus, in an embodiment of a method in accordance with the presentinvention, the method includes, after performing the hybridizationreaction, recovering probe-defined nucleic acid fragments, wherein saidrecovering comprises separating the complementary fragments from thesolid support. As described above, in particular embodiments, separatingthe complementary fragments from the solid support may include elutingthe fragments with an elution buffer or may include cleaving the probesfrom the solid support to release the complementary fragments. Incertain embodiments, after the complementary fragments are separatedfrom the solid support, they may be purified and/or amplified to providethe probe-defined nucleic acid fragments.

In particular embodiments, the EDSCR process may be repeated multipletimes (e.g. two times, three times, four times, five times or more)depending upon the required purity of the fragment pool and/or thesequence complexity of the starting sample and final desired mixture.The EDSCR process may be performed in a serial manner using a definedset of microarrays wherein each successive member of the defined setcomprises a defined subset of probes from the previous microarray.

In certain embodiments, the method of the present invention furtherincludes, after recovering the probe-defined nucleic acid fragments,performing a second hybridization reaction. In the second hybridizationreaction, the probe-defined nucleic acid fragments are contacted with asecond support having a second probe set bound thereto to result inprobe-defined nucleic acid fragments being retained on the secondsupport via the second probe set. In embodiments in which a secondhybridization reaction is performed, the method of the present inventionfurther includes, separating the probe-defined nucleic acid fragmentsretained on the second support from the second support.

In typical embodiments, the probe-defined nucleic acid fragmentsretained on the second support are hybridized to the second probe understringent conditions. The probe-defined nucleic acid fragments retainedon the second support are typically washed under stringent conditions toremove non-specifically bound fragments. Separating the probe-definednucleic acid fragments retained on the second support from the secondsupport provides probe-defined nucleic acid fragments which are definedby the second probe set (i.e. the probes in the second probe set are theprobes providing the “probe-defined” aspect of the probe-defined nucleicacid fragments separated from the second support.) The second supportmay have essentially the same form and composition of the solid supportor may have any other suitable form and composition disclosed hereinwith respect to the solid support. The second probe set will typicallybe a subset of the probes bound to the solid support. The second probeset will typically provide for reduction in sequence complexity of theprobe-defined nucleic acid fragments separated from the second supportas compared to the probe-defined nucleic acid fragments separated fromthe solid support.

As indicated above, the EDSCR process may be repeated multiple times,e.g. 3, 4, 5, or more times. In such embodiments, the method willinclude hybridizing to a third support having a third probe set boundthereto, a fourth support having a fourth probe set bound thereto, and afifth support having a fifth probe set bound thereto, respectively. Eachprobe set will typically include a reduced set of probes from theprevious iteration. The method in such embodiments includes conditionsand components analogous to the earlier description pertaining to thesolid support having the probes bound thereto, and performance of themethods of such embodiments will be apparent given the disclosureherein.

Microarray Probe Design Considerations:

The nucleotide length and number of microarray probes (feature density)will depend upon a variety of issues including the sequence complexityof the starting sample mixture, the overall average fragment length andthe requirements of the downstream application in which the reducedcomplexity mixture will be used. For example, EDSCR probes can bedesigned to investigate specified “target-types”, such as codingregions, transcription control regions, introns, intron-exon boundaryregions, methylation sites, recombination sites and so forth.

While each downstream application (library creation, sequencing, etc.)or specified target-types will likely bring application or sequenceelement-specific criteria to the probe design, there exist some generalconsiderations. The probe length should be sufficient to ensure specifichybridization within the boundaries of the sequence complexity of thesample mixture. In preferred embodiments, the probe lengths will bebetween 10 and 100 nucleotides, and more preferably between 20 and 60nucleotides. The exact location and length of each probe with respect toeither a random set of fragments or a defined restriction fragment willbe determined using standard accepted design methods which account forbase-pairing specificity and stability, internal probe and fragmentstructures, repetitive elements and potential modifications.

For many applications, it will be desirable that the probes not be toospecific so that fragments which contain both known and unknownnucleotide variations (e.g SNPs, deletions and insertions) within theregion complementary to probe can be isolated (see below). The rate ofsingle nucleotide variation within the human genome is estimated atapproximately 1/1250 (Venter, C. J. et al., (2001) Science 291, 1304).For example, a 20-mer probe would be sufficient to ensure the necessaryfragment specificity within a mixture having a sequence complexity equalto that of the human genome. However, it is likely that any singlenucleotide variation, deletion or insertion within the complementaryregion of the targeted fragment would prevent stable hybridization atthe required fragment-specific stringency. In contrast, 60-mer probeswould possess the required fragment specificity while allowing for somedegree of mismatches and/or deletions thereby enabling the isolation offragment having unknown variations at any location within the fragment.

Both the overall fragment length of the starting sample mixture and thefeature density of the microarray will determine the sequence complexityof the final mixture of probe-defined nucleic acid fragments isolatedusing the method of the present invention. It has been shown that singlenucleotide specificity can be achieved in a microarray format for samplemixtures having a sequence complexity approaching 500,000,000 base-pairs(Kennedy et al., (2003), Nat. BioTechnol. 21; 1233). Moreover, recentCGH experiments with various types of microarrays indicate thatfragment-specific hybridization can be achieved from samples having asequence complexity equal to that of the entire human genome (3.2×10⁹base pairs) (see: U.S. patent application Ser. No. 10/744,595 to Bruhnet al.; filed Dec. 22, 2003). EDSCR is therefore useful for reducing thesequence complexity of the entire human genome (or equivalent) to anylower value in an individual element-defined manner.

In preferred embodiments, the microarray will comprise between 100 and500,000 probe (features) where each probe comprises a sequence elementthat is complementary to a defined set of fragments within the samplemixture. The number of probes will depend upon the sequence complexityof the starting sample, the desired level of sequence complexityreduction and the particular downstream application in which thefragment mixture will be used.

By way of example, an array can be designed to isolate a set offragments corresponding to a defined set of genes from a human genomesample. The mean length of an entire human gene including exons andintrons is about 14,000 base-pairs. However, while the length of an exoncan range from only a few base-pairs to ˜500 base-pairs, the mean exonlength is only about ˜120 base-pairs. The mean number of exons per geneis about 7 giving a mean coding region of approximately 1,000 base-pairs(Lander E. S et al., (2001) Nature, 409, 860). Given this average genearchitecture, the minimal number of probes per average gene would be 7;one for each exon. However, it may be desirable to have sufficientredundancy to ensure capture of the entire gene including intron-exonboundaries. It may also be desirable to include probes for both strandsof the genome. Thus in a preferred embodiment for the capture of adefined coding region of a gene, the number of probes per average genewould range between 7 and 30 depending upon the required coverage. Thus,a microarray comprising 100,000 defined probes (30 per gene) couldcapture >3,000 genes from the human genome. This would correspond toapproximately 10% of the total coding region of the human genome (LanderE. S et al., (2001) Nature, 409, 860, Venter, C. J. et al., (2001)Science 291, 1304). Likewise, a much smaller microarray having only˜10,000 features could capture the coding region for ˜300 genes (ormore, depending on the number of probes per gene).

Another example of a particular array design would be one directed atisolating a defined set of expressed genes (mRNAs) from a biologicalsample. In this case, the probes would be directed to a defined set ofcDNAs generated from an mRNA mixture. In a preferred embodiment, thecDNAs would be full-length copies. The cDNAs comprise a contiguous setof exons with no interruption by introns, and would therefore have anaverage length of approximately 1,000 nucleotides (Lander E. S et al.,(2001) Nature, 409, 860). The number of probes per gene would thereforedepend upon the total sequence complexity of starting cDNA sample andthe degree to which the cDNA mixture is fragmented. A minimum number ofprobes per cDNA would be one but more preferably 3 and most preferably6. Thus, in the case of 6 probes per gene, a microarray comprising100,000 probes could isolate >15,000 cDNA species which is equivalent tothe actual number of individual mRNA species that is expressed in a cellat any given time. As discussed above, the number of probes and theirlocation with respect to the desired genomic region will be dictated bythe ability to ensure specific hybridization within the sequencecomplexity of the starting sample.

EXAMPLES

The practice of the present invention will employ, unless otherwiseindicated, conventional techniques of synthetic organic chemistry,biochemistry, molecular biology, and the like, which are within theskill of the art. Such techniques are explained fully in the literature.

The following examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how toperform the methods and use the compositions disclosed and claimedherein. Efforts have been made to ensure accuracy with respect tonumbers (e.g., amounts, temperature, etc.) but some errors anddeviations should be accounted for. Unless indicated otherwise, partsare parts by weight, temperature is in ° C. and pressure is at or nearatmospheric. Standard temperature and pressure are defined as 20° C. and1 atmosphere.

Use of EDSCR for Genetic Variation Discovery:

Single nucleotide polymorphisms (SNPs) are powerful genetic markers forassociating specific genetic alleles with complex traits or disease.Genome-wide association studies generally rely upon a universal set ofSNPs that are relatively frequent (>10%) among all or most ethnicpopulations. However, it is becoming increasingly clear that manydisease causing or disease associated alleles are rare within the entirepopulation, often being restricted to a particular ethnic group, definedcohort or closely related families.

Faham and coworkers (Human Molecular Genetics. (2001) 10:1657-1664) havedeveloped a powerful method to perform high-throughput scanning of DNAsequence variations. This method, termed “Mismatch Repair Detection”(MRD), leverages the mismatch repair system of E. coli to select, for apre-defined set of genomic fragments derived from a pool of individuals,only those fragments that contain variant sequences with respect to areference standard sample. MRD exploits the ability of bacterial cellsto co-repair long stretches of DNA in a manner that specifically selectsfor only those transformants that possessed a sequence variant.

A critical step in the MRD process requires the formation ofheteroduplexes between a complex pool of single stranded standardscomprising the genomic regions of interest and their complements derivedfrom a pooled sample of many individuals, which may contain SNPs orother types of sequence variations. The present MRD process canaccommodate thousands of different genomic fragments having a lengthbetween 300 and 500 nucleotides. This would allow for about millions ofbases of genomic sequence to be scanned in a single MRD reaction. Thusit is theoretically possible to scan the entire coding and associatedcontrol regions of the human genome (˜200 Mb) in just a few separate MRDreactions using current MRD protocols.

However, a current bottleneck in the MRD process is the isolation andamplification of hundreds of thousands of defined genomic fragments usedto construct the standard vectors as well as the complementary fragmentsfrom the pooled cohort used to perform MRD. The protocol is currentlyperformed, on a much smaller scale, using multiplex PCR to isolate andamplify the desired fragments. However, in order to realize the fullpotential of MRD, a faster and more cost-effective method for isolatingand amplifying these fragments is necessary.

The EDSCR process may be used in conjunction with the MRD application asoutlined in FIGS. 2A & 2B. A microarray having probes directed to thedesired restriction fragments comprising the genomic elements ofinterest would be designed and manufactured according to the criteriaoutlined above. The vectors (plasmids) containing the standard fragmentsequences corresponding to the genomic regions of interest would then begenerated by performing the EDSCR process according to FIG. 2A accordingusing the protocols and conditions outlined above. Briefly, the standardgenomic sample 202 a would be fragmented (indicated by arrow 104) bydigestion with a restriction endonuclease selected to produce thedesired average fragment lengths and boundaries. Adaptor duplexes 112would then be ligated (indicated by arrow 114) to the fragmented DNA106, incorporating a common yet unique primer site for subsequentamplification. The duplex terminated fragments 116 would then bedenatured and contacted (indicated by arrow 124) with the probes 222 onthe microarray 220. Non-specifically bound fragments 132 would be washed134 off the microarray 220 using stringent wash conditions. Thecomplementary fragments 126 would then be separated (indicated by arrow144) from the microarray 220, e.g. by elution, to give the complementaryfragments in solution 146, which are then PCR amplified (indicated byarrow 254) using primers complementary to the common primer site. Theresulting probe-defined nucleic acid fragment mixture would then bedigested (indicated by arrow 264) with the original restrictionendonuclease to recreate the original cohesive ends and then ligated(indicated by arrow 274) into the linearized pMRD100 plasmid 272 (orequivalent) to create the final standard vector pool 276.

Referring to FIG. 2B, the pooled experimental cohort genomic DNA 202 bcomprising a mixture of between 10 and 1,000 individuals would then befragmented 104 by digestion with the same restriction endonuclease asabove. Adaptor duplexes 112 would then ligated 114 to the fragmented DNA106, incorporating a common yet unique primer site for subsequentamplification. The duplex terminated fragments 116 would then bedenatured and contacted 124 with the probes 222 on the microarray 220.Non-specifically bound fragments 132 would be washed 134 off themicroarray 220 using stringent wash conditions. The complementaryfragments 126 would then be separated 144 from the microarray 220, e.g.by elution, to give the complementary fragments in solution 146, whichare then PCR amplified (indicated by arrow 254) using primerscomplementary to the common primer site. The resulting probe-definednucleic acid fragment mixture 256 is then used in combination with thestandard vector pool 276 (see FIG. 2A) to perform the MRD process 280 asoutlined by Faham and coworkers (Human Molecular Genetics. (2001)10:1657-1664).

Use of EDSCR for Ultra High Throughput Sequencing:

A number of novel methods for ultra high throughput DNA sequencing arecurrently under development. Many of these emerging technologies arebased on the analysis of single nucleic acid molecules or clonallyamplified versions thereof (for reviews see: J. Shendure et al., NatureReviews (2004) δ: 335-345; A. Marziali and M. Akeson, Ann. Rev. Biomed.Engineering (2001) 3: 195-223). These methods include i) thetranslocation of single nucleic acid molecules through some type ofnanopore or nanochannel (Akeson, M., et al., (1999) Biophysics, 77,3227), or ii) some stepwise chemical or fluorescence-based sequencing ofspatially arrayed single molecules or beads containing multiple copiesthereof (Levene, M. J. et al., (2003) Science, 299, 682; Braslaysky, I.,et al., (2003) Proc. Natl. Acad. Sci. USA, 100, 396; Smith, T., (2004)Drug Discovery Today; Targets, 3, 112). In both types of approaches, ahigh multiplicity of nucleic acid fragments is needing to be sequencedeither in serial or highly parallel manner. Moreover, most methodsrequire some type of modification of the fragments (e.g. adaptorligation) in order to facilitate the amplification, surface attachmentor the particular sequencing chemistry (e.g. sequencing by synthesis).However, as discussed above, there currently exists no effective methodfor generating the required defined mixture of nucleic acid fragmentsstarting from a high sequence complexity mixture such as an entiregenome.

The present invention, EDSCR, is a novel approach to effectivelygenerate the desired DNA mixture having the required modifications toenable the ultra high throughput sequencing process. For thisapplication, the EDSCR process is carried out essentially as describedabove and outlined in FIG. 3. A microarray having probes directed to thegenomic elements of interest would be designed and manufacturedaccording to the criteria outlined above. A genomic sample 302 wouldthen be randomly fragmented 104 to an average length dictated by thecriteria outlined above to provide a mixture of fragmented DNA 306.Adaptor duplexes 112 containing the appropriate sequencing and/oramplification primer site would then be ligated 114 to the mixture. Ifthe sequencing method requires the covalent attachment of the individualDNA fragments to a solid-support, then the appropriate chemical moiety(e.g. active amines, aldehydes, esters, biotin) would be incorporatedinto the adaptor duplex 112. The mixture may then optionally be enriched324 for those fragments containing the necessary adaptor sequences.These duplex terminated fragments 116 would then be denatured andcontacted 124 with the probes 222 on the microarray 220.Non-specifically bound fragments 132 would be washed 134 off themicroarray 220 using stringent wash conditions. The complementaryfragments 126 would then be separated 144 from the microarray 220, e.g.by elution, to give the complementary fragments in solution 146. Ifnecessary for the desired sequencing application, the complementaryfragments in solution 146 may then be amplified in an amplificationreaction (indicated by arrow 354), e.g. PCR, LCR, or isothermalamplification, using primers complementary to the common primer site ifnecessary. The resulting probe-defined nucleic acid fragments 356 maythen used in accordance with the specific process of the sequencingmethod employed. In various embodiments the sequencing method mayinvolve attaching the probe-defined nucleic acid fragments to a planarsurface. In some embodiments, the sequencing method may involveattaching the probe-defined nucleic acid fragments to beads followed byclonal amplification (e.g. via PCR or emulsion PCR). In certainembodiments the probe-defined nucleic acid fragments are sequenced insolution (not bound to a surface).

While the foregoing embodiments of the invention have been set forth inconsiderable detail for the purpose of making a complete disclosure ofthe invention, it will be apparent to those of skill in the art thatnumerous changes may be made in such details without departing from thespirit and the principles of the invention. Accordingly, the inventionshould be limited only by the following claims.

All patents, patent applications, and publications mentioned herein arehereby incorporated by reference in their entireties.

1-32. (canceled)
 33. A method of nucleic acid sample analysis,comprising: a) fragmenting a sample of genomic DNA from a cell to obtainfragmented double-stranded genomic nucleic acids; b) ligating adouble-stranded adaptor duplex comprising a primer site to both ends ofsaid fragmented double-stranded genomic nucleic acids to produceadaptor-ligated nucleic acids; c) hybridizing said adaptor-ligatednucleic acids with nucleic acid probes to result in complementaryfragments being retained on a solid support via probes bound to thesolid support; d) washing the solid support to provide a washed solidsupport comprising said complementary fragments; and e) subjecting saidwashed solid support to conditions sufficient to separate allcomplementary fragments retained on said washed solid support via saidprobes from the washed solid support to produce a reduced complexitymixture that is made up of said complementary fragments; and f)sequencing said complementary fragments of said mixture.
 34. The methodof claim 33, wherein the adaptor duplex comprises a reactive moiety forbinding the fragmented nucleic acids to a substrate having asurface-bound complementary active moiety for binding with the reactivemoiety.
 35. The method of claim 33, wherein the contacting is performedunder stringent conditions.
 36. The method of claim 33, wherein thesolid support is a microarray having the nucleic acid probes boundthereto.
 37. The method of claim 36, wherein the solid support comprisesa planar support comprising one or more substrate materials selectedfrom glass, silicas, metals, teflons, and polymeric materials.
 38. Themethod of claim 33, wherein the solid support comprises beads.
 39. Themethod of claim 38, wherein the beads comprises one or more substratematerials selected from nitrocellulose, glass, silicas, teflons, metals,and polymeric materials.
 40. The method of claim 36, wherein said solidsupport comprises from 2 to 2,000,000 probes.
 41. The method of claim36, wherein the nucleic acid probes are bound to said solid support infeatures that are at a density of up to 100,000 per 1 mm2.
 42. Themethod of claim 33, wherein 10 to 100,000 different probes are bound tothe solid support.
 43. The method of claim 33, wherein between 100 and500,000 different probes are bound to the solid support.
 44. The methodof claim 33, further comprising enriching said adaptor-ligated nucleicacids.
 45. The method of claim 33, further comprising amplifying saidcomplementary fragments.
 46. The method of claim 45, wherein amplifyingthe complementary fragments comprises performing an amplificationreaction selected from PCR, LCR, or isothermal amplification.
 47. Themethod of claim 33, wherein said conditions of step e) comprise elutionof the complementary fragments retained on said washed solid support.48. The method of claim 47, wherein said elution comprises washing in abuffer at a pH of 8 to 9.5 and a temperature of 70° C. to 95° C.
 49. Themethod of claim 33, wherein said conditions of step e) are sufficient tocleave said probes from the washed solid support.